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Abstract 


The  probability  distribution  of  a  Markov  chain  is  viewed  as  the  information  state  of  an  additive 
optimization  problem.  This  optimization  problem  is  then  generalized  to  a  product  form  whose 
information  state  gives  rise  to  a  generalized  notion  of  probability  distribution  for  Markov  chains. 
The  evolution  and  the  asymptotic  behavior  of  this  generalized  or  “risk-sensitive”  probability  dis¬ 
tribution  is  studied  in  this  paper  and  a  conjecture  is  proposed  regarding  the  asymptotic  periodicity 
of  risk-sensitive  probability.  The  relation  between  a  set  of  simultaneous  non-linear  equations  and 
the  set  of  periodic  attractors  is  analyzed. 
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1  Introduction 


It  is  well  known  that  the  probability  distribution  of  an  ergodic  Markov  chain  is  asymptotically 
stationary,  independent  of  the  initial  probability  distribution,  and  that  the  stationary  distribution 
is  the  solution  to  a  fixed  point  problem  [5].  This  probability  distribution  can  be  viewed  as  the 
information  state  for  an  estimation  problem  arising  from  the  Maximum  A  Posterior  Probability 
Estimator  (MAP)  estimation  of  the  Markov  chain  for  which  no  observation  is  available. 

Risk-sensitive  filters  [7]- [13]  take  into  account  the  “higher  order”  moments  of  the  estimation 
error.  Roughly  speaking,  this  follows  from  the  analytic  property  of  the  exponential  ex  =  xk /k\ 

so  that  if  \k  stands  for  the  sum  of  the  error  functions  over  some  interval  of  time  then 

E[exp(  7\k)]  =  E[  1  +  7\k  +  (7)2(\k)2/2  +  ...]. 


Thus,  at  the  expense  of  the  mean  error  cost,  the  higher  order  moments  are  included  in  the  minimiza¬ 
tion  of  the  expected  cost,  reducing  the  “risk”  of  large  deviations  and  increasing  our  “confidence” 
in  the  estimator.  The  parameter  7  >  0  controls  the  extent  to  which  the  higher  order  moments  are 
included.  In  particular,  the  first  order  approximation,  7  — »  0,  E[exp{ 7T]  =  1  +  7 F'k,  indicates 
that  the  original  minimization  of  the  sum  criterion  or  the  risk-neutral  problem  is  recovered  as  the 
small  risk  limit  of  the  exponential  criterion. 

Another  point  of  view  is  that  the  exponential  function  has  the  unique  algebraic  property  of 
converting  the  sum  into  a  product.  In  this  paper  we  show  that  a  notion  of  probability  for  Markov 
chains  follows  from  this  point  of  view  which  due  to  its  connection  to  risk-sensitive  filters,  will  be 
termed  “risk-sensitive  probability  (RS-probability)” .  We  consider  an  estimation  problem  of  the 
states  of  a  Markov  chain  in  which  the  cost  has  a  product  structure.  We  assume  no  observation  is 
available  and  that  the  initial  probability  distribution  is  known.  We  will  define  the  RS-probability  of 
a  Markov  chain  as  an  information  state  for  this  estimation  problem  whose  evolution  is  governed  by  a 
non-liear  operator.  The  asymptotic  behavior  of  RS-probability  appears  to  be  periodic.  Asymptotic 
periodicity  has  been  reported  to  emerge  from  random  perturbations  of  dynamical  systems  governed 
by  constrictive  Markov  integral  operators  [3]  [4].  In  our  case,  the  Markov  operator  is  given  by  a 
matrix;  the  perturbation  has  a  simple  non-linear  structure  and  the  attractors  can  be  explicitly 
calculated. 

In  Section  2,  we  view  the  probability  distribution  of  a  Markov  chain  as  the  information  state 
of  an  additive  optimization  problem.  RS-probability  for  Markov  chains  are  introduced  in  section 
3.  We  show  that  its  evolution  is  governed  by  an  operator  (denoted  by  F7)  which  can  be  viewed 
as  a  generalization  of  the  usual  linear  Markov  operator.  The  asymptotic  behavior  of  this  operator 
is  studied  in  section  3  and  a  conjecture  is  proposed.  Under  mild  conditions,  it  appears  that  RS- 
probability  is  asymptotically  periodic.  This  periodic  behavior  is  governed  by  a  set  of  simultaneous 
quadratic  equations. 
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2  Probability  as  an  information  state 


In  [2] ,  [1]  we  studied  the  exponential  (risk-sensitive)  criterion  for  the  estimation  of  HMM’s  and 
introduced  risk-sensitive  filter  banks. 

The  probability  distribution  of  a  Markov  chain,  knowing  only  initial  distribution,  determines 
the  most  “likely  state”  in  the  sense  of  MAP.  In  the  context  of  Hidden  Markov  Models  (HMM) ,  the 
problem  can  be  viewed  as  that  of  “pure  prediction”;  i.e.,  an  HMM  whose  states  are  entirely  hidden. 

Define  a  Hidden  Markov  Model  as  a  five  tuple  <  X,Y,X,A,  Q  >;  here  A  is  the  transition 
matrix,  Y  =  {1,2,...,  Ay}  is  the  set  of  observations  and  X  =  {1,2,...,  Ax}  is  the  finite  set  of 
(internal)  states  as  well  as  the  set  of  estimates  or  decisions.  In  addition,  we  have  that  Q  :=  [qx,y] 
is  the  Ax  x  Ay  state/observation  matrix,  i.e.,  qX)V  is  the  probability  of  observing  y  when  the  state 
is  x.  We  consider  the  following  information  pattern.  At  decision  epoch  t,  the  system  is  in  the 
(unobservable)  state  Xt  =  i  and  the  corresponding  observation  Yt  is  gathered,  such  that 

P(Yt  =  j\Xt  =  i)  =  %j.  (1) 

The  estimators  V)  are  functions  of  observations  (Yq, . Yt)  and  are  chosen  according  to  some  speci¬ 

fied  criterion.  Consider  a  sequence  of  finite  dimensional  random  variables  X )  and  the  corresponding 
observations  1}  defined  on  the  common  probability  space  (S3,  M,  P).  Let  X t  be  a  Borel  measurable 
function  of  the  filtration  generated  by  observations  up  to  Yt  denoted  by  3 h-  The  Maximum  A 
Posteriori  Probability  (MAP)  estimator  is  defined  recursively;  given  Xq,  ...,  i,  Xt  is  chosen  such 
that  the  following  sum  is  minimized: 


i 

E[Y,p(Xi,Xi)]t 


i= 0 


(2) 


where 

,  ,  f  0  if  u  =  v; 

=  otherwise, 

The  usual  definition  of  MAP  as  the  argument  with  the  greatest  probability  given  the  observation 
follows  from  the  above  [6].  The  solution  is  well  known;  we  need  to  define  recursively  an  information 
state 


at+1  =  Ay  •  Q(Yt+i)AT  ■  at,  (3) 

where  Q(y)  ■=  diag(qity)1  A 1  denotes  the  transpose  of  the  matrix  A.  uq  is  set  equal  to  Ay  ■Q{Yq)po, 
where  po  is  the  initial  distribution  of  the  state  and  is  assumed  to  be  known. 

When  no  observation  is  available,  it  is  easy  to  see  that  Ay  •  Q(Yt)  =  /,  where  I  is  the  identity 
matrix.  Thus,  the  information  state  for  the  prediction  case  evolves  according  to  cq+i  =  AT -at  which 
when  normalized  is  simply  the  probability  distribution  of  the  chain.  This  “prediction”  optimization 
problem  for  a  multiplicative  cost  will  be  considered  next. 
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3  RS-Probability  for  Markov  chains 


With  the  notation  of  the  previous  section,  given  Xq,  define  Xt  recursively  as  the  estimator 

which  minimizes  the  product 


EiUp^XuXi 


i= o 


(4) 


*/  \  /  1  if  u  =  v\ 

p  (u,v)  =  <  ’ 

L  r  =  e7  otherwise. 

Associate  with  each  i  £  X,  a  unit  vector  in  R ^  whose  ith  component  is  1.  Assume  that  no 
observation  is  available  and  that  the  initial  probability  distribution  is  given. 

Theorem  1:  The  estimator  which  minimizes  (4)  is  given  by 

Xt  =  argmax  ieSx  <  Ut  ,  e;  >, 


where  Ut  evolves  according  to 

Ut+ 1  =  AT  ■  H{diag  (  exp(j  <  eargrgax  Vi  ,  ej  >)  )  •  Ut}  =:  T7(C/t), 
and  H(X)  =  .  and  Uq  =  po- 

Proof:  See  [2], 


(5) 


The  operator  T7  can  be  viewed  as  a  non-linear  generalization  of  the  linear  operator  AT .  It  can 
be  shown  that  that  this  operator  plays  a  similar  role  in  the  estimation  of  risk-sensitive  MAP  for 
HMM’s  as  the  operator  AT  in  the  risk-neutral  case.  The  purpose  of  this  paper  is  to  compare  the 
asymptotic  behavior  of  T7  and  AT . 


It  is  well  known  that  under  primitivity  of  the  matrix  A ,  the  dynamical  system  defined  by 

Pn+ 1  =  ATpn ,  (6) 

for  every  choice  of  the  initial  probability  distribution  p^,  converges  to  p*  which  satisfies  ATp*  = 

P*[  51- 

Definition:  A  Cycle  of  RS-Probability  (  CRP  )  is  a  finite  set  of  probabilities  {u1, ...,  vm}  such  that 
F7(u*)  =  vl+l  with  F7(um)  =  vl ;  m  is  called  the  period  of  the  CRP. 

We  pose  the  following  conjecture: 


Conjecture:  Let  the  stochastic  matrix  A  be  primitive.  Then,  for  every  choice  of  the  initial 

probability  distribution  po,  the  dynamical  system 


Ut+ 1  =  F\Ut) 


(7) 
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is  asymptotically  periodic,  i.e. ,  U±  approaches  a  CRP  as  t  — >  oo  satisfying  the  equations 

F7^1)  =  u2,F7(u2)  =  u3,...,F7(um)  =v\  (8) 

The  condition  F7(td)  =  v2,F7(u2)  =  u3,  ...,F7(um)  =  v1  can  be  considered  a  generalization  of  the 
equation  ATp*  =  p* .  It  is  not  difficult  to  show  that  in  general,  the  equations  are  quadratic.  Note 
that  we  do  not  exclude  the  case  m  =  1;  the  CRP  only  has  one  element  and  thus  F7  is  asymptotically 
stationary.  Next,  we  report  a  number  of  other  properties  of  F7. 


Property  1:  Dependence  of  the  asymptotic  behavior  on  the  initial  condition.  The 

asymptotic  behavior  of  F7  may  depend  on  the  initial  conditions.  That  is,  depending  on  the  initial 
condition  a  different  CRP  may  emerge.  Let  A  be  given  by 


0.2  0.8 

0.6  0.4 


e7  =  100. 


(9) 


Let  the  initial  condition  be  given  by  (ui,u2)-  There  are  two  different  CRP’s  depending  on  the 
initial  conditions: 


F7(u)  =  u  = 


0.594' 

0.405 


if  ui  >  u2 


(10) 


F7(u)  =  v  = 


0.214' 

0.785 


if  u2  >  Ui. 


(11) 


When  is  the  asymptotic  behavior  independent  of  the  initial  condition?  We  believe  this  depends  on 
the  relation  between  the  diagonal  and  off-diagonal  elements  of  A.  For  example,  consider  the  matrix 


0.6  0.4 

0.25  0.75 


e7  =  10. 


The  CRP,  for  every  initial  condition,  has  two  elements 

CRP:{v\v2)  F7(u1)  =  v2  F7(w2)  =  v1. 


(12) 


(13) 


'0.283' 

2 

'0.534' 

0.716 

vz  = 

0.465 

(14) 


It  appears  that  when  the  diagonal  elements  “dominate”  the  off-diagonal  elements,  the  asymptotic 
behavior  is  independent  of  the  initial  condition.  We  have  carried  out  a  thorough  investigation 
for  6x6  stochastic  matrices  and  lower  dimensions,  but  we  suspect  the  property  holds  in  higher 
dimensions.  One  could  reason  that  large  diagonal  elements  indicate  a  more  “stable”  dynamical 
system  compared  to  the  case  with  high  “cross-flow”  among  the  states.  The  non-linear  perturbation 
of  our  dynamical  system  with  higher  levels  of  cross-flow  tends  to  “split”  the  stationary  attractor. 
Understanding  the  precise  behavior  is  an  open  problem.  But,  below  we  describe  some  special  cases. 


Property  2:  Dependence  of  the  period  on  7.  Our  simulations  show  that  for  small 
values  of  7  the  period  is  1;  i.e.,  F7  is  asymptotically  stationary.  As  7  increases  periodic  behavior 
may  emerge;  based  on  simulation  of  the  examples  we  have  studied,  the  period  tends  to  increase 
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with  increasing  7  but  then  decrease  for  large  values.  So,  the  most  complex  behavior  occurs  for  the 
mid-range  values  of  7.  Consider 


0.8  0.2' 
0.4  0.6  ’ 


(15) 


and  let  m  be  the  period.  Our  simulations  show  that  the  period  m  of  the  CRP’s  depends  on  the 
choice  of  7;  our  simulations  results  in  the  pairs  (e7  ,  m):(2.1,l)  (2.7,1)  (2.9,1)  (3,1)  (3.01,7)  (3.1,5) 
(3.3,4)  (3.9,3)  (10,2)  (21,2).  We  can  see  that  even  in  two  dimensions,  the  behavior  of  F 7  is  complex. 


When  does  the  periodic  behavior  emerge?  The  fixed  point  problem  provides  the  answer.  If  the 
fix  point  problem  F~’(u)  =  u  does  not  have  a  solution  satisfying  0  <  u  <  1,  the  asymptotic  behavior 
cannot  be  stationary.  For  two  dimensions,  the  equation  F7(m)  =  u  =  (mi,m2)7  is  easy  to  write. 
Assume  u\  >  m2  (for  the  case  U2  >  u\ ,  we  transpose  1  and  2). 


A 


an 

Ol2 

021 

022 

and  recall  that  Mi  +  m2  =  1.  This  yields 


(e7  -  1  )u\  +  (an  -  e7a2i  -  e7)  +  a2ie7  =  0  u\  >  m2 


(16) 


(17) 


(e7  -  1  )m|  +  rti(a22  -  e7ai2  -  e7)  +  ai2e7  =  0  m2  >  mi  (18) 

First,  note  that  when  7  =  0,  we  have 

ui(ai2  +  a2i )  =  a2i  (19) 

which  is  linear  and  is  the  fixed  point  problem  AT(u)  =  u.  For  the  above  example,  the  roots  of  the 
equation  resulting  from  the  assumption  m2  >  u\  are  greater  than  one  for  all  ranges  of  e7  >  1.  Thus, 
stationarity  requires  that  a  solution  to 

(e7  —  1  )nf  +  Mi (0.8  —  e70.4  —  e7)  +  0.4e7  =  0  m2  <  mi  (20) 

exist.  One  solution  turns  out  to  be  greater  than  one.  The  other  solution  is  plotted  vs.  r  =  e7  in 
Figure  1.  The  condition  m2  <  u±  fails  for  e7  >  3.  Thus  for  e7  >  3  no  stationary  solution  can  exist. 
If  the  conjecture  is  correct,  the  periodic  behavior  must  emerge,  which  is  exactly  what  we  observed 
above.  Based  on  the  examples  we  have  studied,  this  is  a  general  property  of  F7  in  two  dimensions 
when  diagonal  elements  “dominate”. 

Let  an  >  «i2  and  a22  >  a2i.  Also,  assume  without  loss  of  generality,  that  an  >  a22.  For  the 
stationary  solution  to  exist  as  we  showed  above,  (17)  must  have  a  solution.  Let  A  =  an  —  e7a2i  —  e7. 
For  small  values  of  7,  the  probability  solution  of  (17)  (0  <  Mi  <  1)  turns  out  to  be 


—A  —  \/A2  —  4a2ie7(e7  —  1) 
2(e7  -  1) 

and  as  m2  <  Mi  implies  1/2  <  mi,  we  must  have 


—A  —  i/A2  —  4a2ie7(e7  —  1) 
2(e7  -  1) 


>  1/2, 


(21) 
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Figure  1:  The  emergence  of  periodicity. 


which  after  some  simple  algebra  implies 


2an  —  1 
1  —  2a2i 


(22) 


If  we  plug  in  an  =  0.8  and  021  =  0.4,  we  get  e7  <  3.  If  the  conjecture  is  true,  periods  must 
appear  for  e7  >  3.  At  e7  =  ,  we  get  u\  =  U2  =  1/2  which  can  be  shown  to  be  an  acceptable 

stationary  solution;  hence  i"1^/,)-  is  a  sharp  threshold.  Our  computations  have  been  consistent  with 
this  result.  For  the  case  an  <  022,  we  obtain 


e7  < 


2tt22  ~  1 
1  —  2ai2 


Writing  an  =  1/2  +  e  and  aji  =  1/2  —  6 ,  both  results  can  be  written  as 


(24)  is  a  measure  of  sensitivity  to  risk. 


(23) 


(24) 


Periodicity  seems  persistent;  once  the  periodic  solutions  emerge,  increasing  e7  does  not  seem  to 
bring  back  the  stationary  behavior.  In  two  dimensions  for  large  values  of  e7,  an  interesting  classi¬ 
fication  is  possible.  Given  that  the  conjecture  hold,  an  obvious  sufficient  condition  for  periodicity 
would  be  for  the  roots  of  (17)  and  (18)  to  be  complex: 

(an  —  e7a2i  —  e7)2  —  4(e7  —  l)a2ie7  <  0  (25) 

(022  —  e7ai2  —  e7)2  —  4(e7  —  l)ai2e7  <  0  (26) 
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period 

attractors 

aii  <  aij  H 

1 

2 

aii  >  aij  i=i 

2 

2 

aii  <  aij  au  >  aji 

|=j 

1 

1 

Figure  2:  The  classification  in  two  dimensions. 


But,  further  inspection  shows  for  sufficiently  large  values  of  e7,  the  inequalities  give 


e27(l  -  a2i)2  <  0 

(27) 

e27(l  -  ai2)2  <  0 

(28) 

which  are  clearly  false  and  so  real  roots  exist.  Other  relations  can  be  exploited  to  show  that  these 
roots  are  unacceptable  and  hence  demonstrate  the  existence  of  periodic  attractors  as  we  will  show 
next.  Consider  the  case  e7  >>  ay,0  <  <  1.  Then,  the  fixed  point  problem  (17)  can  be  written 

as 


e7Ui  -  e7(l  +  a2i)ui  +  a2ie7  =  0  (29) 

u\  -m(l  +  a2 1)  +  a2i  =  (m  -  l)(«i  -  a2i).  (30) 

The  solutions  turn  out  to  be  (1,0)  and  (a2i,o22).  (1,0)  can  be  ruled  out  by  the  assumption 

0  <  ciij  <  1.  The  assumption  u\  <  u2,  (18),  leads  by  transposition  to  solutions  (1,0)  and  (an,ai2). 
Thus,  if  we  assume  an  >  ai2  and  a22  >  a2i,  both  solutions  can  be  ruled  out;  for  large  values  of  e7 
the  fixed  point  value  problem  with  period  one  (the  stationary  case)  does  not  have  a  solution  and 
the  period  must  be  2  or  more. 

If  we  assume  a2i  >  a22  and  ai2  >  an  then  both  (a2i,  a22)  and  (an,  ai2)  are  acceptable  solutions. 
This  was  the  case  in  (10)  and  (11);  our  computations  show  that  there  are  in  fact  two  stationary 
solutions  close  to  (a2i,a22)  and  (ai2,an)  depending  on  the  initial  conditions.  Likewise,  we  can  use 
the  simultaneous  quadratic  equations  to  classify  all  the  attractors  in  two  dimensions  which  emerge 
with  increasing  e7.  For  the  matrix  A  given  by  (15),  we  see  that  this  behavior  is  already  emergent 
at  about  e7  =  10.  Figure  2  shows  the  classification.  It  is  possible  to  write  down  these  equations 
in  higher  dimensions  as  simultaneous  quadratic  equations  parameterized  by  e7.  Classifying  the 
solutions  of  these  equations  is  an  interesting  open  problem. 

Property  3:  Dependence  of  the  period  and  and  the  periodic  attractors  on  transition 
probabilities.  The  dependence  of  the  period  on  transition  probabilities  is  shown  next.  Let 


€ 

0.1 

0.01 

0.008 

0.006 

0.004 

0.002 

0.001 

0.0002 

Period 

4 

4 

29 

21 

17 

78 

430 

682 

Table  1:  The  periodic  behavior  for  (31) 


Figure  3:  The  first  component  of  RS-probability  vs.  time  for  e  =  0.001. 


A  = 


'0.9  -e 
0.4 
0.0 


0.1 

0.6 


e 

0.0 

1.0  -  e 


(31) 


and  e7  =  101.  The  CRP’s  appear  to  be  independent  of  the  initial  conditions  but  the  period  can 
depend  strongly  on  e  as  Table  1  shows.  Figure  3  shows  the  values  of  the  first  component  of  the 
RS-probability  vs.  time  for  e  =  0.001.  (There  are  2000  data  points  and  hence  some  apparent 
overlaps).  In  Table  1,  we  see  that  as  e  — >  oo,  the  period  increases  rapidly.  One  possible  explanation 
is  that  e  controls  the  mixing  properties  of  (31);  the  matrix  A  is  primitive  only  for  strictly  positive 
values  of  e  and  as  e  approaches  zero,  (31)  “approaches”  a  non-nrixing  dynamical  system  and  hence 
its  stationary  behavior  becomes  less  “stable” . 

The  result  suggests  that  the  asymptotic  behavior  of  non-primitive  stochastic  matrices  under 
risk-sensitivity  is  interesting  and  merits  investigation. 
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Cost 

evolution 

Asym. 

attractors 

Probability 

additive 

linear 

asym. 

stationarity 

T 

A  p=p 

Independent  of  initial  conditions 

RS-Probability 

multiplicative 

non-linear 

asym. 

periodicity 

FYv  1  =  v2,  Fv2=v3 ....  Fvm=v  1 
Dependent  on  initial  conditions 

Figure  4:  Comparing  F7  and  AT . 


4  Conclusions 


The  risk-sensitive  estimation  of  HMM’s  gives  rise  to  a  notion  of  probability  for  Markov  chains  arising 
from  a  non-linear  generalization  of  the  linear  operator  AT,  where  A  is  a  row-stochastic  primitive 
matrix.  This  operator,  denoted  by  F7  in  this  paper,  has  a  number  of  properties  summarized  in 
the  table  above.  There  is  an  interesting  relation  between  the  asymptotic  behavior  of  F7  and  a 
set  of  simultaneous  non-linear  equations  parameterized  by  e7  determining  the  periodic  solutions. 
We  provided  some  description  of  this  relation  for  the  two-dimensional  case  in  this  paper.  We  have 
posed  a  series  of  open  problems  which  are  the  subject  of  our  further  research. 
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